NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Tuning-Free Bilevel Optimization: New Algorithms and Convergence Analysis

Yang, Yifan; Ban, Hao; Huang, Minhui; Ma, Shiqian; Ji, Kaiyi (May 2025, International Conference on Learning Representations)

Bilevel optimization has recently attracted considerable attention due to its abundant applications in machine learning problems. However, existing methods rely on prior knowledge of problem parameters to determine stepsizes, resulting in significant effort in tuning stepsizes when these parameters are unknown. In this paper, we propose two novel tuning-free algorithms, D-TFBO and S-TFBO. D-TFBO employs a double-loop structure with stepsizes adaptively adjusted by the "inverse of cumulative gradient norms" strategy. S-TFBO features a simpler fully single-loop structure that updates three variables simultaneously with a theory-motivated joint design of adaptive stepsizes for all variables. We provide a comprehensive convergence analysis for both algorithms and show that D-TFBO and S-TFBO respectively require $$\mathcal{O}(\frac{1}{\epsilon})$$ and $$\mathcal{O}(\frac{1}{\epsilon}\log^4(\frac{1}{\epsilon}))$$ iterations to find an $$\epsilon$$-accurate stationary point, (nearly) matching their well-tuned counterparts using the information of problem parameters. Experiments on various problems show that our methods achieve performance comparable to existing well-tuned approaches, while being more robust to the selection of initial stepsizes. To the best of our knowledge, our methods are the first to completely eliminate the need for stepsize tuning, while achieving theoretical guarantees.
more » « less
Free, publicly-accessible full text available May 1, 2026
First-Order Federated Bilevel Learning

https://doi.org/10.1609/aaai.v39i21.34355

Yang, Yifan; Xiao, Peiyao; Ma, Shiqian; Ji, Kaiyi (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

Federated bilevel optimization (FBO) has garnered significant attention lately, driven by its promising applications in meta-learning and hyperparameter optimization. Existing algorithms generally aim to approximate the gradient of the upper-level objective function (hypergradient) in the federated setting. However, because of the nonlinearity of the hypergradient and client drift, they often involve complicated computations. These computations, like multiple optimization sub-loops and second-order derivative evaluations, end up with significant memory consumption and high computational costs. In this paper, we propose a computationally and memory-efficient FBO algorithm named MemFBO. MemFBO features a fully single-loop structure with all involved variables updated simultaneously, and uses only first-order gradient information for all local updates. We show that MemFBO exhibits a linear convergence speedup with milder assumptions in both partial and full client participation scenarios. We further implement MemFBO in a novel FBO application for federated data cleaning. Our experiments, conducted on this application and federated hyper-representation, demonstrate the effectiveness of the proposed algorithm.
more » « less
Free, publicly-accessible full text available April 11, 2026
First-Order Federated Bilevel Learning

Yang, Yifan; Xiao, Peiyao; Ma, Shiqian; Ji, Kaiyi (April 2025, Proceedings of the AAAI Conference on Artificial Intelligence)

Federated bilevel optimization (FBO) has garnered significant attention lately, driven by its promising applications in meta-learning and hyperparameter optimization. Existing algorithms generally aim to approximate the gradient of the upper-level objective function (hypergradient) in the federated setting. However, because of the nonlinearity of the hypergradient and client drift, they often involve complicated computations. These computations, like multiple optimization sub-loops and second-order derivative evaluations, end up with significant memory consumption and high computational costs. In this paper, we propose a computationally and memory-efficient FBO algorithm named MemFBO. MemFBO features a fully single-loop structure with all involved variables updated simultaneously, and uses only first-order gradient information for all local updates. We show that MemFBO exhibits a linear convergence speedup with milder assumptions in both partial and full client participation scenarios. We further implement MemFBO in a novel FBO application for federated data cleaning. Our experiments, conducted on this application and federated hyper-representation, demonstrate the effectiveness of the proposed algorithm.
more » « less
Free, publicly-accessible full text available April 11, 2026
Tuning-Free Bilevel Optimization: New Algorithms and Convergence Analysis

Yang, Yifan; Ban, Hao; Huang, Minhui; Ma, Shiqian; Ji, Kaiyi (April 2025, ICLR)

Free, publicly-accessible full text available April 1, 2026
First-Order Minimax Bilevel Optimization

Yang, Yifan; Si, Zhaofeng; Lyu, Siwei; Ji, Kaiyi (December 2024, Advances in Neural Information Processing Systems)

Multi-block minimax bilevel optimization has been studied recently due to its great potential in multi-task learning, robust machine learning, and few-shot learning. However, due to the complex three-level optimization structure, existing algorithms often suffer from issues such as high computing costs due to the second-order model derivatives or high memory consumption in storing all blocks’ parameters. In this paper, we tackle these challenges by proposing two novel fully first-order algorithms named FOSL and MemCS. FOSL features a fully single-loop structure by updating all three variables simultaneously, and MemCS is a memory-efficient double-loop algorithm with cold-start initialization. We provide a comprehensive convergence analysis for both algorithms under full and partial block participation, and show that their sample complexities match or outperform those of the same type of methods in standard bilevel optimization. We evaluate our methods in two applications: the recently proposed multi-task deep AUC maximization and a novel rank-based robust meta-learning. Our methods consistently improve over existing methods with better performance over various datasets.
more » « less
Full Text Available
Deep Learning for Precipitation Retrievals Using Combined Measurements from GOES-16 and GOES-18 Satellites

https://doi.org/10.1109/IGARSS53475.2024.10641567

Yang, Yifan; Chen, Haonan; Kuo, Kwo-Sen (July 2024, IEEE)

Full Text Available
Trapezoid: A Versatile Accelerator for Dense and Sparse Matrix Multiplications

Yang, Yifan; Emer, Joel S; Sanchez, Daniel (July 2024, IEEE)

Full Text Available
Azul: An Accelerator for Sparse Iterative Solvers Leveraging Distributed On-Chip Memory

https://doi.org/10.1109/MICRO61859.2024.00054

Feldmann, Axel; Golden, Courtney; Yang, Yifan; Emer, Joel S; Sanchez, Daniel (November 2024, IEEE)

Full Text Available
Babel: A Scalable Pre-trained Model for Multi-Modal Sensing via Expandable Modality Alignment

https://doi.org/10.1145/3715014.3722068

Dai, Shenghong; Jiang, Shiqi; Yang, Yifan; Cao, Ting; Li, Mo; Banerjee, Suman; Qiu, Lili (May 2025, ACM)

Free, publicly-accessible full text available May 6, 2026
PID Control-Based Self-Healing to Improve the Robustness of Large Language Models

Chen, Zhuotong; Wang, Zihu; Yang, Yifan; Li, Qianxiao; Zhang, Zheng (April 2024, Transactions on machine learning research)

Full Text Available

« Prev Next »

Search for: All records